164 ◾ Bioinformatics
region consists of codons that are translated into amino acids. The major difference
between prokaryotic and eukaryotic genes is that the coding regions of a eukaryotic gene
(also known as exons) are found between the non-coding regions (known as introns)
that make gene expression in eukaryotes more complex. In eukaryotes, once mRNA
is transcribed, the introns are removed by splicing and the final mRNA transcript will
form an open reading frame (ORF) that can be translated into a polypeptide. A single
eukaryotic gene may code for several genes by the so-called alternative splicing produc-
ing multiple proteins called isoforms. In eukaryotic mRNA, introns are spliced out at
specific splicing sites found at the 5′ and 3′ ends of introns. Most commonly, the intron
sequence that is spliced out begins with the dinucleotide GT (donor splice site) at its 5′
end and ends with AG (acceptor splice site) at its 3′ end [1]. Very rare splice site may be
found at AT of 3′-end and AC at the 3′-end of an intron and also at GC (5′-end) and AG
(3′ end) of an intron [2]. On the other hand, a prokaryotic gene consists of a continuous
sequence of codons that makes gene expression in prokaryotes simpler. A typical func-
tional promoter region of a eukaryotic gene consists of the transcription start site (TSS,
where Polymerase II binds), a core promoter called TATA box (around 40 nucleobases
upstream from the transcription start site), upstream promoters (different in sequence
from a gene to another), and the enhancer (may be found thousands of nucleobases
upstream or downstream). TATA box is consistent among eukaryotic species; it is the
site where transcription factors (TATA box-binding proteins or TBP) bind to form a
complex that is necessary for transcription to take place. The upstream promoters can
bind to different types of proteins to activate the gene. The enhancer binds to special
types of proteins to bend the DNA sequence so the enhancer that can bind to the protein
complex on the promoter region.
The prokaryotic genes are organized into blocks called operons. Genes in a single operon
are regulated and expressed together. An operon contains all genes that encode proteins
which carry out together a specific function. For instance, the bacterial genes required for
lactose as energy source are found next to each other in the lactose operon (lac operon).
The gene structure and regulation of prokaryotic genes is different from that of eukaryotic
ones. There is no intron in prokaryotic genes and an operon, which contains several genes,
has a single promoter controlling the gene expression of all genes in that operon. There are
three types of proteins (repressors, activators, and inducers) that regulate expression of
genes in an operon.
In general, the normally functioning cells of a living organism must be able to control
gene transcription by turning on and off genes based on the need of the cells for protein
synthesis and other functions. The process of turning on a gene to be transcribed to syn-
thesize functional gene products is known as the gene expression. The activity of a gene is
measured by the amount of its transcript (mRNA). Laboratory techniques like northern
blot, serial analysis of gene expression (SAGE), quantitative PCR (qPCR), and DNA micro-
array have been used to study gene expression. However, these techniques can be replaced
these days by the high-throughput RNA sequencing, which provide more information
than any of the other techniques.